Verification of Markov Decision Processes using Learning Algorithms

机译：利用学习算法验证马尔可夫决策过程

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present a general framework for applying machine-learning algorithms tothe verification of Markov decision processes (MDPs). The primary goal of thesetechniques is to improve performance by avoiding an exhaustive exploration ofthe state space. Our framework focuses on probabilistic reachability, which isa core property for verification, and is illustrated through two distinctinstantiations. The first assumes that full knowledge of the MDP is available,and performs a heuristic-driven partial exploration of the model, yieldingprecise lower and upper bounds on the required probability. The second tacklesthe case where we may only sample the MDP, and yields probabilistic guarantees,again in terms of both the lower and upper bounds, which provides efficientstopping criteria for the approximation. The latter is the first extension ofstatistical model-checking for unbounded properties in MDPs. In contrast withother related approaches, we do not restrict our attention to time-bounded(finite-horizon) or discounted properties, nor assume any particular propertiesof the MDP. We also show how our techniques extend to LTL objectives. Wepresent experimental results showing the performance of our framework onseveral examples.

机译：我们提出了一种将机器学习算法应用于马尔可夫决策过程（MDP）验证的通用框架。这些技术的主要目标是通过避免详尽探索状态空间来提高性能。我们的框架专注于概率可达性，这是验证的核心属性，并通过两个不同的实例进行说明。第一个假设假定MDP的全部知识可用，并对模型进行启发式驱动的部分探索，从而得出所需概率的精确上下限。第二种方法解决了仅对MDP进行抽样的情况，并根据上下限给出了概率保证，这为近似提供了有效的停止标准。后者是MDP中无边界属性统计模型检查的第一个扩展。与其他相关方法相反，我们不将注意力集中在有时间限制的（有限水平）或折价属性上，也不假定MDP的任何特定属性。我们还将展示我们的技术如何扩展到LTL目标。我们通过几个示例展示实验结果，这些结果显示了我们框架的性能。

著录项

作者
Brázdil, Tomáš; Chatterjee, Krishnendu; Chmelík, Martin; Forejt, Vojtěch; Křetínský, Jan; Kwiatkowska, Marta; Parker, David; Ujma, Mateusz;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. ${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control [J] . Dejan V. Djonin, Vikram Krishnamurthy IEEE Transactions on Signal Processing . 2007,第期

机译：带有随机单调策略的约束Markov决策过程的$ {Q} $-学习算法：在MIMO传输控制中的应用
2. Reinforcement learning based algorithms for average cost Markov Decision Processes [J] . Abdulla MS, Bhatnagar S Discrete event dynamic systems: Theory and applications . 2007,第1期

机译：基于增强学习的平均成本马尔可夫决策过程算法
3. Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes [J] . Mohammed Shahid Abdulla, Shalabh Bhatnagar Discrete Event Dynamic Systems . 2007,第1期

机译：基于增强学习的平均成本马尔可夫决策过程算法
4. Verification of Markov Decision Processes Using Learning Algorithms [C] . Tomas Brazdil, Krishnendu Chatterjee, Martin Chmelik, International symposium on automated technology for verification and analysis . 2014

机译：使用学习算法验证马尔可夫决策过程
5. A New Reinforcement Learning Algorithm with Fixed Exploration for Semi-Markov Decision Processes [D] . Encapera, Angelo Michael. 2017

机译：半马尔可夫决策过程的固定探索新强化学习算法
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Verification of markov decision processes using learning algorithms [O] . Krishnendu Chatterjee, Marta Kwiatkowska, David Parker, 2016

机译：使用学习算法验证马尔可夫决策过程

Verification of Markov Decision Processes using Learning Algorithms

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅